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Abstract 

Dynamic  activity  involving  social  networks  often  has  distinctive  temporal  patterns  that  can  be  exploited  in  situations 
involving  incomplete  information.  Gang  rivalry  networks,  in  particular,  display  a  high  degree  of  temporal  clustering  of 
activity  associated  with  retaliatory  behavior.  A  recent  study  of  a  Los  Angeles  gang  network  shows  that  known  gang 
activity  between  rivals  can  be  modeled  as  a  self-exciting  point  process  on  an  edge  of  the  rivalry  network.  In  real-life 
situations,  data  is  incomplete  and  law-enforcement  agencies  may  not  know  which  gang  is  involved.  However,  even 
when  gang  activity  is  highly  stochastic,  localized  excitations  in  parts  of  the  known  dataset  can  help  identify  gangs 
responsible  for  unsolved  crimes.  Previous  work  successfully  incorporated  the  observed  clustering  in  time  of  the  data 
to  identify  gangs  responsible  for  unsolved  crimes.  However,  the  authors  assumed  that  the  parameters  of  the  model 
are  known,  when  in  reality  they  have  to  be  estimated  from  the  data  itself.  We  propose  an  iterative  method  that 
simultaneously  estimates  the  parameters  in  the  underlying  point  process  and  assigns  weights  to  the  unknown  events 
with  a  directly  calculable  score  function.  The  results  of  the  estimation,  weights,  error  propagation,  convergence  and 
runtime  are  presented. 
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Introduction 

In  this  work  we  focus  our  attention  on  data  sets  of  events 
involving  rival  gangs  on  a  social  network.  Each  event  in 
the  data  set  corresponds  to  a  crime  that  occurs  at  a  spec¬ 
ified  time  and  involves  a  pair  of  rival  gangs.  A  subset  of 
these  events  are  unsolved  crimes  in  which  one  or  both  of 
the  rival  gangs  is  not  known.  The  method  developed  in 
this  paper  could  be  broadly  applied  to  any  social  network 
involving  activities  in  time  between  pairs  of  nodes  on  the 
network.  However  the  interest  in  the  problem  came  about 
by  examining  data  from  the  Hollenbeck  Division  of  the 
Los  Angeles  Police  Department,  home  to  29  street  gangs 
with  a  well-known  rivalry  network  [1-3]. 

Unlike  other  methods  used  to  address  incomplete  data 
relating  to  social  networks  [4,5],  the  question  at  hand  is 
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not  if  a  rivalry  exists,  but  rather  to  which  rivalry  a  violent 
event  belongs.  This  structure  of  between  gang  rivalries 
can  be  viewed  as  a  social  network  [6]  often  embedded 
in  space  [1,2].  Violent  events  involving  gangs  tend  to  be 
dyadic,  and  so  we  can  formulate  these  events  as  a  realiza¬ 
tion  of  a  stochastic  process  occurring  on  the  edges  of  the 
rivalry  network.  For  each  edge  in  the  network  there  exists 
a  different  stochastic  process.  In  our  analysis  however,  we 
use  identical  parameters  to  generate  synthetic  data.  The 
method  does  not  assume  that  the  underlying  parameters 
generating  each  process  are  identical. 

The  first  step  to  inferring  the  affiliation  of  the  vio¬ 
lent  events  is  to  understand  the  underlying  stochastic 
process.  This  requires  us  to  capture  the  behavior  of  crim¬ 
inal  activity  through  computational  means,  much  like  in 
[7-9].  Recently  methods  have  been  proposed  in  the  litera¬ 
ture  to  mathematically  model  gang  violence.  The  authors 
in  [10]  employ  an  agent-based  model  to  investigate  the 
geographic  influences  in  the  formation  of  the  gang  rivalry 
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structure  observed  in  Hollenbeck.  These  authors  consider 
the  long-term  structure  of  the  rivalry  network  embed¬ 
ded  in  space.  In  terms  of  the  rivalry  violence,  a  shorter 
timescale  must  be  considered. 

Violence  among  gangs  exhibits  retaliatory  behavior  [11]. 
In  other  words,  given  an  event  has  happened  between 
two  gangs,  the  likelihood  that  another  event  will  happen 
shortly  after  is  increased.  A  problem  such  as  this  is  mod¬ 
eled  naturally  by  a  self-exciting  point  process.  It  is  inter¬ 
esting  to  note  that  these  models  were  first  used  to  analyze 
earthquakes  [12-15].  Since  then,  they  have  been  used  to 
model  financial  contagion  in  credit  markets  [16,17],  viral 
videos  on  the  web  [18],  terrorist  activity  in  Indonesia  [19], 
and  the  spread  of  infectious  disease  [20].  In  this  analysis 
we  limit  the  scope  of  our  model  to  include  time  only,  thus 
providing  a  baseline  model. 

The  authors  in  [21]  and  [22]  have  successfully  modeled 
the  pairwise  gang  violence  as  a  Hawkes  process  [23].  All 
of  the  events  are  associated  with  exactly  one  rivalry,  or 
edge  of  a  social  network.  The  violence  on  each  edge,  /<,  is 
assumed  to  have  the  conditional  intensity 

h(t\Hr,k)  =  nk  +  ak  ^  coke~“k(t~ti) .  (1) 

t>tj 

In  this  Hawkes  process,  the  intensity  Xk(t\HT depends 
on  the  history  of  the  process  =  {t\,  t2,  •  •  •  tMk}, 

where  M is  the  number  of  events  for  process,  k.  In  this 
framework,  the  window  of  time,  [  0,  T],  observed  for  each 
process  in  the  network  is  the  same.  However,  the  number 
of  events  in  each  process,  M/a  is  stochastic,  and  there¬ 
fore  varies  from  process  to  process.  In  practice  the  final 
time,  T,  is  determined  by  the  end  of  the  data  collec¬ 
tion  period.  Further,  the  edges  of  the  window  introduce 
boundary  effects  that  are  adjusted  for  in  the  parameter 
approximation,  see  Equations  10  and  11. 

The  background  rate  of  the  process  is  defined  by  the 
constant  /x^.  In  the  context  of  gang  rivalries,  background 
events  can  be  thought  of  as  random  occurrences  between 
rival  gangs  that  trigger  retaliatory  events.  The  expected 
number  of  offspring  for  any  event  is  determined  by  the 
constant  otfc,  and  the  decay  of  the  intensity  back  to  the 
background  rate  is  <x>/c.  Offspring  events,  in  this  context, 
could  be  interpreted  as  retaliatory  events.  Larger  values 
for  /x^  and  a produce  more  background  and  offspring 
events  respectively.  Larger  values  of  co^  do  not  influence 
the  total  number  of  events,  but  rather  the  amount  of 
clustering  in  time. 

The  authors  of  [24]  produce  a  mathematical  frame¬ 
work  to  solve  the  incomplete  data  problem  observed  in 
gang  violence  data  sets.  In  their  work  they  use  an  opti¬ 
mization  strategy  that  computes  the  weights  to  infer  the 
rivalry  affiliation  of  the  incomplete  data.  In  this  formu¬ 
lation  the  authors  prove  that  their  optimization  has  a 


unique  solution  under  mild  constraints.  This  is  substan¬ 
tial  contribution  in  inferring  the  affiliation  of  the  unknown 
violent  events.  However,  the  authors  of  [24]  assume  that 
the  process  parameters  are  known,  an  assumption  that  is 
often  not  feasible  in  practice.  Further,  finding  the  weights 
requires  solving  a  computationally  expensive  optimization 
problem. 

We  propose  an  iterative  method  that  (A)  estimates  the 
process  parameters  assuming  the  data  is  generated  by 
the  process  defined  by  Equation  1  and  (B)  infers  the 
process  affiliation  of  simulated  data  via  a  direct  method 
of  computation.  We  iterate  between  (A)  and  (B)  until 
the  estimates  for  the  unknown  events  converge.  We  call 
this  the  Estimate  &  Score  Algorithm  (ESA).  The  details 
of  the  ESA  are  described  in  Section  “The  Estimation  & 
Score  Algorithm  (ESA)”.  The  ESA  is  tested  on  simulated 
data  in  Section  “Results”,  with  analysis  of  the  estima¬ 
tion  of  the  parameters  in  the  presence  of  incomplete 
data  (see  Subsection  “Estimation  analysis”)  and  compar¬ 
ison  of  the  proposed  score  functions  with  that  of  the 
Stomakhin-Short-Bertozzi  (SSB)  method  in  [24]  (see  Sub¬ 
section  “Updating  Weights  analysis”).  In  Subsection  “Run¬ 
time  Analysis”  there  is  an  analysis  of  the  runtime  between 
the  Stomakhin-Short-Bertozzi  and  the  Forward  Backward 
score  functions  used  to  update  the  weights  (see  Sub¬ 
section  “Runtime  Analysis”).  Subsection  “Convergence 
Results”  contains  an  analysis  of  the  convergence  of  the 
Estimation  &  Score  Algorithm.  This  method  solves  the 
more  realistic  problem  of  estimating  the  process  and  the 
weights.  Further,  the  computation  for  the  weight  updates 
is  more  direct  and  therefore  avoids  performing  the  costly 
optimization  scheme  used  in  [24].  This  is  a  novel  piece  of 
work  with  many  exciting  extensions.  A  final  discussion  of 
the  results  and  future  work  is  presented  in  Section  “Dis¬ 
cussion  and  Future  Work”.  As  in  [24]  we  do  not  use 
field  data  in  this  paper,  rather  we  generated  point  pro¬ 
cess  data  using  similar  parameters  as  observed  in  the  field 
data  for  Hollenbeck  [22] .  By  using  simulated  data  to  test 
the  algorithms  we  have  actual  ground  truth  evaluate  the 
performance  of  the  method. 

Problem  Formulation 

The  data  is  assumed  to  lie  on  a  known  social  network 
containing  K  processes,  where  each  of  the  K  processes 
is  a  pairwise  rivalry  between  two  gangs.  From  this  set  of 
events,  there  are  a  total  of  N  events  where  the  time  is 
known,  but  the  processes  affiliation  is  not  known.  These 
events  are  referred  to  as  unknown  events .  Each  of  the  N 
unknown  events  are  placed  into  each  of  the  K  processes. 
Since  the  process  affiliation  is  not  known  for  all  of  the 
events  in  the  network,  each  event  is  given  an  associated 
weight,  Sifi.  Here  is  the  ith  element  of  the  /cth  process. 
If  the  event  is  known  =  1.  If  is  unknown  then  it  is 
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Figure  1  Simplified  representation  of  the  rivalry  network  with 
known  (circle)  and  unknown  (triangle)  events.  The  known  events 
are  depicted  with  circles  and  the  one  unknown  event  is  depicted 
with  a  triangle.  Note  that  since  we  do  not  know  the  affiliated  process 
of  this  event,  we  place  it  in  all  processes.  Associated  with  this  event  is 
a  weights  %  e[0, 1]  such  that  J2k=]  %  —  1  • 


represented  by  a  triangle.  Here  we  can  see  that  since  we  do 
not  know  the  affiliation  of  the  triangle  event,  it  is  placed 
in  all  of  the  other  processes.  We  emphasize  that  this  rep¬ 
resents  our  lack  of  information  about  which  rivalry  it 
belongs  to. 

As  indicated  in  Figure  1,  for  each  process  in  the  net¬ 
work  events  are  indexed  by  increasing  time,  t\  < 
t2  <  £3  •  ■  ■  <  tMk .  Ordering  the  events  in  such  a  way 
has  the  consequence  that  the  first  unknown  element  in 
time,  for  example,  may  have  different  indexes  for  different 
processes.  In  Figure  1  the  triangle  index  in  first  process 
is  the  third  event,  e\$.  However  the  triangle  in  the  Kth 
process  is  the  second  event,  ejc)2.  One  can  easily  keep 
track  of  the  local  index  of  a  unknown  event  for  each 
process. 


assigned  a  number  between  0  and  1  by  our  algorithm.  We 
enforce  the  constraint  that  Ylk= 1  =  1- 

A  simplified  representation  of  our  problem  formu¬ 
lation  can  be  found  in  Figure  1.  The  known  events 
are  represented  by  circles  and  the  unknown  event  is 


The  Estimation  &  Score  Algorithm  (ESA) 

The  proposed  Estimation  &  Score  Algorithm  can  be 
broken  into  three  basic  stages:  initialization,  parameter 
estimation,  and  updating  the  weights.  This  method  is 
succinctly  described  in  Figure  2. 


r 

Normalize  Weights 

X 

V 

0  Qtjk 

Si,k  =  K 

Efc=i9i,fe 

/ 

No 

Did  the  Weights 

Yes 

End 

Converge? 

Figure  2  Flow  chart  of  the  Estimation  &  Score  Algorithm.  There  are  two  ways  to  implement  this  method.  The  first,  (left  of  initialization),  is  the 
algorithm  used  when  given  an  incomplete  data  set.  The  second,  (right  of  initialization),  is  the  algorithm  used  in  this  paper  to  simulate  the  data  and 
test  the  components  of  the  ESA.  The  two  main  phases  of  the  algorithm  are  the  Estimation  phase  (see  Section  "Estimation  analysis")  and  the  Update 
Weights  phase  (see  Section  "Updating  weights"). 
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Initialization 

For  this  paper,  there  were  two  ways  of  initializing  the  Esti¬ 
mate  &  Score  Algorithm.  The  first  is  used  to  infer  rivalry 
affiliation  given  field  data.  After  importing  the  data,  the 
unknown  events  are  identified  and  placed  into  each  of  the 
of  the  I<  processes.  The  weights,  Sitkt  must  also  be  ini¬ 
tialized.  If  the  event  is  known,  then  5^=1.  If  the  event  is 
unknown  then  ^ . 

An  alternate  initialization  utilizes  simulated  data  in 
order  to  test  the  components  of  the  Estimate  &  Score 
Algorithm.  In  this  case,  data  is  generated  from  I<  indepen¬ 
dent  Hawkes  processes  with  given  iik,  a k,  and  ook.  From 
these  data,  choose  N  events  at  random  from  the  network 
to  mark  as  unknown.  Place  these  N  unknown  events  into 
each  of  the  other  processes.  Initialize  the  weights  such  that 
for  known  events  Sijk  =  1  and  for  unknown  events  Sijk  = 
1/K.  This  initialization  process  is  used  in  this  paper  to  test 
the  method  and  produce  the  results  in  Section  “Results”. 


Incorporating  the  branching  structure  into  the  log- 
likelihood  function,  the  event  association  is  added  as  a 
random  variable,  Xi,j  such  that 


Xi,j  = 


1  if  event/  caused  event;  and /  ^  j 
1  if  event/  is  a  background  event  and /  =  ;  . 
0  else 


(3) 


This  branching  allows  us  to  separate  those  events 
associated  with  the  background  /Jtk  and  the  response 
g(t)  =  a^ejo^e-60^.  This  leads  to  the  altered  log-likelihood 
function 


Mk  ~T 

h(Hr,k\ fik,ak) cok)  =  VxMlogOx,)-  /  [ikdt  (4) 

«=i  Jo 

Mk  [  Mk 

i=  1  I  j=i+l 


Mk 

J2  Xi,;1og(' 


akcoke 


-COk(tj~t, 


") 


Parameter  Estimation 

In  the  presence  of  no  unknown  events,  there  are  both 
parametric  [12]  and  nonparametric  [25-28]  ways  to 
model  the  underlying  stochastic  process  on  each  edge  of 
the  social  network.  For  this  work,  we  chose  a  parametric 
form  for  the  triggering  density  to  validate  the  model  but 
the  results  could  easily  be  extended  to  the  nonparamet¬ 
ric  case.  We  note  that,  as  is  usual  with  nonparametric 
estimates,  speed  would  be  compromised  for  the  sake  of 
flexibility. 

For  this  paper,  the  data  is  assumed  to  be  a  realization 
of  Equation  1,  where  the  parameters  are  estimated  using 
a  method  similar  to  the  Expectation  Maximization  (EM) 
algorithm  [29].  An  EM-like  approach  is  taken  because  of 
the  branching  structure  present  in  a  Hawkes  process.  In 
such  a  process  each  event  can  be  associated  with  a  back¬ 
ground  or  response  event.  However,  given  a  realization 
from  this  process  it  is  not  immediately  obvious  whether 
an  event  is  a  background  or  response  event.  We  can  view 
this  information  as  a  hidden  variable  that  we  must  esti¬ 
mate.  In  this  way,  every  event  in  each  of  the  K  processes 
is  assigned  a  probability  P\-.  The  probability  that  event 

/  is  a  background  event  is  denoted  PkiV  and  probability 
that  event  /  caused  event  j  is  denoted  P\-.  This  assumes 
that  ti  <  tj.  From  this  EM  estimation,  the  approxima¬ 
tion  for  each  of  the  variables  is  altered  to  include  the 
weights  for  the  unknown  events.  In  fact,  in  the  case  where 
all  the  events  are  known,  the  estimation  formulas  are  the 
same.  This  section  derives  the  EM  estimates  when  in  the 
presence  of  incomplete  data. 

The  classical  log-likelihood  function  lk(HT)k\/jLk,  otk,  cok) 
for  a  general  point  process  with  a  fixed  window  [  0,  T]  is 

Mk  »T 

f k(Hr  k l/x^,  otkf o)k)  =  ^  '  hk(tj \HX}k)  —  J  Xk(t\Hr>k)dt.  (2) 

“7  Jo 


CT~ti 

I  akCi)ke~0)k^ds  >  . 

Jo 


Taking  the  expectation  of  lk(HT)k\tAk,ak,cok)  with 
respect  to  Xi,j  results  in 


£*[4(Wr,*lM;<  «*><*>*)]  =  y^-logO*)  -  /  Vkdt  (5) 

~7  Jo 

Mk 

^2  rfi  loS  (a*<w*e_a'*r<*/_W) 


i= 1 
Mk 

i=  1 


akcoke 


-**(*) ds 


In  the  EM  algorithm,  the  quantity 
Ex  [  lk(HTik\iik,  ak,  ook)]  is  maximized  with  respect  to  each 
of  the  variables  /x^,  ak,  ook  given  the  data  HXjk .  This  leads 
to  the  EM  estimates 


Pk 


\^Mk  pk 
2-^i=  1 1  i,i 


ak 


y^Mk  pi c 

^i<j  1  hj 


Mk  -  YsTA 


(6) 


cok 


e: 


pk. 

i<)  i,j 


-  ti)P*u  +  «*  Zt\(T  -  ti)e-^(T-n) 


•  (7) 


Where  P\-  is  defined  by 


pk  =  222  2  pi<  =  a *  (8) 

h’  h(ti\Hr,k)  ’  w  kmHxjd’ 

for  ti  <  tj.  The  EM  algorithm  then  becomes  a  mat¬ 
ter  of  iterating  between  estimating  the  probabilities  and 
the  parameters.  It  has  been  proven  that  this  algorithm 
will  converge  under  mild  assumptions  [29].  Further, 
Equation  6  adjusts  for  boundary  effects. 
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In  the  presence  of  events  with  unknown  process  affilia¬ 
tion  in  the  network,  we  assign  weights  to  the  contribution 
of  each  event  to  the  log-likelihood  function.  Specifically, 
each  of  the  unknown  events  in  process  k  have  a  weight 
Stf,  such  that  J2k  =  1*  F°r  the  known  events  Skk  =  1. 
These  weights  are  incorporated  for  each  process  via 


Lk(Hr,k\pk>  &k>  G>k) 

Mk  ~T 

=  E  tfiSijc  log ifik)  -  /  iikdt 

i= 1  J° 

Mk- 1  Mk 

+  E  E  SiJcSjjcPij  log  (aka)ke-(0kdi-t‘^ 
i— 1  j=i~\~  1 

rr-fi 

-  E  'S'^  /  akcoke~Wk(s) ds.  (9) 

<=i  ^ 


Note  that  Lk(HTfk\/jLk,ak,OL>k)  is  no  longer  an  EM  log 
likelihood  in  the  presence  of  unknown  data.  Maximizing 
Lk(HTfk\/jLk,  ak>  MJ<)  with  respect  to  each  of  the  parameters 
the  estimates  become 


Rk  = 


Am 


T!uj  ptjsi,kSj,k 


ak  = 


Mk 


(10) 


Estimates  for  jd,  True  jlx  =  0.01 
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Figure  3  Plot  of  the  parameter  estimates  for  fik  as  the  number  of 
unknown  events  increase.  Plots  of  the  estimates  for  [ik  for  the 
Unknown  Not  Included  (dash-triangle),  Unknown  Included  (dash- 
square),  Equal  Weights  (dash-x),  ESA  (dash-circle),  and  No  Unknown 
(solid).  In  each  of  the  three  figures,  the  estimates  are  plotted  vs  the 
number  of  unknown  events.  Each  network  has  five  processes  with  the 
true  parameters  fik  =  0.01  ,cok  =  0.1 ,  and  ak  =  0.5.  Each  data  point 
presented  is  the  average  of  the  results  from  100  simulated  networks. 


as  a  probability  via  Sik  =  ^l,k  ♦  For  simplicity  we  con- 

L,k  tfi.k 

sider  a  response  function  of  the  form ,gk(t)  = 

Ratio  Score  Function 

The  Ratio  score  function  considers  the  ratio  of  the  back¬ 
ground  rate  i±k  and  the  sum  of  all  the  future  events, 
J2i<jgk(tj  ~  U)-  Mathematically  the  score  is  determined  by 


cok  = 


Ei<yO tj  ~  ti)PkUjSukSj,k+ak  Sitk(T-ti)e-^T-ti) 


(ID 


When  all  of  the  events  are  known,  i.e.  Skk  =  1  when 
unknown  event  /,  k  belongs  to  process  k  and  is  zero 
otherwise,  these  estimates  become  identical  to  the  EM 
parameter  estimates. 

Updating  weights 

At  the  start  of  the  Estimation  &  Score  algorithm  all  of 
the  weights  for  the  unknown  events  are  Skk  =  1/K. 
Once  the  parameters  are  estimated  using  the  altered  EM 
algorithm  described  in  Equation  11,  the  weights,  Skk,  are 
updated,  see  Figure  2.  Here  we  present  four  different  score 
functions  and  the  Stomakhin-Short-Bertozzi  method  [24], 
used  to  define,  qkk,  the  intermediate  process  affiliation. 
Each  of  these  score  functions  synthesize  information  from 
different  portions  of  the  data  set.  Given  an  event  early 
in  the  data  set,  a  score  function  that  uses  future  events 
would  be  ideal.  On  the  other  hand,  for  later  events  a  score 
function  using  previous  events  is  desired.  Similar  consid¬ 
erations  should  be  made  if  there  are  portion  of  the  data 
with  more  incomplete  data.  After  all  of  these  intermediate 
weights,  qifk ,  have  been  calculated,  they  are  re-normalized 


J^atio  _ 


Tn<lgk(tj_  ~  tj) 

dk(ti) 


(12) 


Lambda  Score  Function 

The  Lambda  score  function  uses  only  previous  informa¬ 
tion  by  taking  the  ratio  of  the  intensities  evaluated  at  the 
unknown  event  time 
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Figure  4  Plot  of  the  parameter  estimates  for  ak  as  the  number  of 
unknown  events  increase.  Plots  of  the  estimates  for  ak  for  the 
Unknown  Not  Included  (dash-triangle),  Unknown  Included  (dash- 
square),  Equal  Weights  (dash-x),  ESA  (dash-circle),  and  No  Unknown 
(solid).  In  each  of  the  three  figures,  the  estimates  are  plotted  vs  the 
number  of  unknown  events.  Each  network  has  five  processes  with  the 
true  parameters  iik  =  0.01 ,  (jok  =  0.1 ,  and  ak  =  0.5.  Each  data  point 
presented  is  the  average  of  the  results  from  100  simulated  networks. 
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Estimates  for  w,  Tme  w  =  0.1 
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Figure  5  Plot  of  the  parameter  estimates  for  a> *  as  the  number  of 
unknown  events  increase.  Plots  of  the  estimates  for  cok  for  the 
Unknown  Not  Included  (dash-triangle),  Unknown  Included  (dash- 
square),  Equal  Weights  (dash-x),  ESA  (dash-circle),  and  No  Unknown 
(solid).  In  each  of  the  three  figures,  the  estimates  are  plotted  vs  the 
number  of  unknown  events.  Each  network  has  five  processes  with  the 
true  parameters  fik  =  0.01 ,  cok  =  0.1 ,  and  cik  =  0.5.  Each  data  point 
presented  is  the  average  of  the  results  from  1 00  simulated  networks. 


Juambda  _  ^k(ti\Hx,k) 

Uk  ~  YfLdAwA 

Stomakhin-Short-Bertozzi  (SSB)  method 

The  method  defined  in  [24]  is  summarized  by 


(13) 


(14) 


subject  to 


E(i??)2  =  i-  («) 


This  method  is  motivated  by  the  Hawkes  process 
defined  in  Equation  1. 


Probability  Score  Function 

The  Probability  score  function  uses  the  approximation  of 
the  branching  structure  of  the  underlying  process.  The 
idea  behind  this  method  is  events  that  are  background 
events  with  no  corresponding  response  events  should  not 
belong  in  the  process.  An  event  that  is  a  background  with 
many  response  events  or  an  event  that  is  a  response  to 
another  event  should  be  part  of  that  process. 


Prob  _ 


pk. 


(16) 


pk  _  Pk(tj) 
w  h(ti\Hr,k) 


pk  =  Adi  ~  ± 

Xk(tj\HT>k) 


(17) 


Forward  Backward  Score  Function 

This  method  is  the  ratio  of  the  summation  of  the  response 
for  the  events  in  the  future  and  the  past,  J2i^jgk(\ti  —  tj\) 
over  the  background  rate  /x^. 


r/B 

Vfk 


-  tj\) 


Pk 


(18) 


Results 

The  Estimation  &  Score  Algorithm  is  tested  for  accu¬ 
racy  on  simulated  data  from  the  Hawkes  process  defined 
in  Equation  1.  An  analysis  of  the  parameter  estimation 
method  outlined  in  Subsection  “Parameter  Estimation” 
is  conducted  in  Subsection  “Estimation  analysis”.  A  com¬ 
parison  of  the  score  functions  when  assuming  the  true 
parameters  is  found  in  Subsection  “Updating  Weights 
analysis”.  Subsection  “Runtime  Analysis”  provides  a  com¬ 
parison  of  the  runtime  between  the  Forward  Back¬ 
ward  score  function  and  the  Stomakhin-Short-Bertozzi 
method.  A  example  of  convergence  of  the  Estimate  & 
Score  Algorithm  is  provided  in  Subsection  “Convergence 
Results”. 


Table  1  Average  and  standard  deviations  for  fik  on  1 00  networks,  true  value  is  ilk  =  0.01 


#  unknown 

15 

30 

45 

60 

75 

Equal 

(Ave) 

0.0102 

0.0098 

0.0100 

0.0099 

0.0096 

Weights 

(StDev) 

±0.0014 

±0.0014 

±0.0017 

±0.0015 

±0.0015 

ESA 

(Ave) 

0.0099 

0.0093 

0.0093 

0.0091 

0.0086 

(StDev) 

±0.0014 

±0.0014 

±0.0017 

±0.0014 

±0.0014 

Unknown 

(Ave) 

0.0098 

0.0091 

0.0089 

0.0085 

0.0079 

Not  Included 

(StDev) 

±0.0014 

±0.0014 

±0.0017 

±0.0015 

±0.0015 

Unknown 

(Ave) 

0.0117 

0.0129 

0.0143 

0.0157 

0.0167 

Included 

(StDev) 

±0.0014 

±0.0017 

±0.0019 

±0.0016 

±0.0019 

No  Unknown 

(Ave) 

0.0100 

0.0095 

0.0094 

0.0093 

0.0088 

(StDev) 

±0.0014 

±0.0014 

±0.0017 

±0.0015 

±0.0015 
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Table  2  Average  and  standard  deviations  for  a k  on  1 00  networks,  true  value  is  a k  =  0.5 


#  unknown 

15 

30 

45 

60 

75 

Equal 

(Ave) 

0.4678 

0.4573 

0.4340 

0.4220 

0.3989 

Weights 

(StDev) 

±0.0636 

±0.0759 

±0.0686 

±0.0726 

±0.0699 

ESA 

(Ave) 

0.4853 

0.4903 

0.4795 

0.4786 

0.4642 

(StDev) 

±0.0640 

±0.0767 

±0.0712 

±0.0741 

±0.0719 

Unknown 

(Ave) 

0.4712 

0.4646 

0.4429 

0.4348 

0.4132 

Not  Included 

(StDev) 

±0.0638 

±0.0779 

±0.0700 

±0.0737 

±0.0702 

Unknown 

(Ave) 

0.4580 

0.4364 

0.4172 

0.4032 

0.3855 

Included 

(StDev) 

±0.0668 

±0.0822 

±0.0705 

±0.0799 

±0.0818 

No  Unknown 

(Ave) 

0.4820 

0.4838 

0.4741 

0.4750 

0.4595 

(StDev) 

±0.0647 

±0.0759 

±0.0726 

±0.0748 

±0.0689 

Estimation  analysis 

There  are  many  ways  we  could  allow  the  unknown  events 
to  influence  our  estimates  of  the  underlying  parameters 
for  each  process.  There  are  two  extremes.  On  the  one 
hand,  we  could  exclude  all  of  the  unknown  events  from  the 
parameter  estimation.  This  would  be  equivalent  to  setting 
the  Si#  =  0  for  all  unknown  events.  On  the  other  hand,  we 
could  include  all  of  the  unknown  events  in  the  estimation 
of  the  parameters  for  each  process.  This  would  be  equiv¬ 
alent  to  letting  Si#  =  1  for  all  i  and  k.  Another  possible 
estimation  method  is  some  combination  of  these  two.  We 
propose  this  as  a  way  of  allowing  the  unaffiliated  events  to 
play  some  role  in  the  estimation  process.  The  naive  choice 
is  allowing  each  event  to  play  the  same  role  in  each  pro¬ 
cess.  This  amounts  to  setting  =  1/K  for  the  unknown 
events.  We  compare  these  three  choices  to  the  estimations 
obtained  by  the  Estimate  &  Score  Algorithm  (ESA)  using 
the  Forward  Backward  score  function.  Finally,  we  want  to 
compare  all  four  of  these  possible  estimation  techniques 
to  the  best  we  could  possibly  do.  In  this  case,  that  would 
mean  we  knew  all  the  affiliations  for  the  events  (i.e.  there 
are  no  unknown  events). 


Figures  3,  4,  5  displays  the  results  for  the  /x^,  aja  and 
a>k  estimates  for  the  five  cases:  =  0  for  unknown 

events  (dash- triangle),  =  1  for  unknown  events  (dash- 
square),  St#  =  1/K  for  unknown  events  (dash-x),  the 
results  using  ESA  (dash-circle),  and  the  estimates  you  get 
when  you  know  all  the  affiliations  for  the  unknown  events 
(solid).  These  results  with  standard  deviations  are  dis¬ 
played  in  Tables  1,  2,  3.  In  each  of  the  three  figures,  the 
estimates  are  plotted  vs  the  number  of  unknown  events. 
Each  network  has  five  processes  with  the  true  parameters 
/jik  =  0.01,  (Ok  =  0.1,  and  ak  =  0.5.  Different  networks 
are  created  with  15,  30,  45,  60,  and  75  unknown  events. 
We  estimate  the  parameters  using  each  of  the  five  meth¬ 
ods  explained  above.  This  procedure  is  repeated  100  times 
with  different  random  seed  values  and  then  the  average 
estimate  is  calculated. 

Notice  in  the  estimates  for  /x/c  in  Figure  3  and  Table  1, 
the  ESA  performs  the  best  compared  to  the  true  value 
and  has  only  a  slight  reduction  in  accuracy  as  the  number 
of  unknown  events  increases.  On  average  the  other  three 
estimates  seem  to  degrade  more  rapidly  as  the  number  of 
unknown  events  increases.  When  =  1,  the  estimates 


Table  3  Average  and  standard  deviations  for  o*k  on  100  networks.  True  Estimate  is  o*k  =  0.1 


#  unknown 

15 

30 

45 

60 

75 

Equal 

(Ave) 

0.1070 

0.1041 

0.1042 

0.1051 

0.10364 

Weights 

(StDev) 

±0.0264 

±0.0274 

±0.0262 

±0.0248 

±0.0255 

ESA 

(Ave) 

0.1069 

0.1042 

0.1039 

0.1059 

0.1045 

(StDev) 

±0.0263 

±0.0273 

±0.0264 

±0.0255 

±0.0240 

Unknown 

(Ave) 

0.1075 

0.1054 

0.1063 

0.1060 

0.1054 

Not  Included 

(StDev) 

±0.0264 

±0.0286 

±0.0269 

±0.0246 

±0.0269 

Unknown 

(Ave) 

0.1048 

0.1101 

0.0988 

0.1022 

0.0993 

Included 

(StDev) 

±0.0275 

±0.1035 

±0.0273 

±0.0301 

±0.0285 

No  Unknown 

(Ave) 

0.1078 

0.1057 

0.1055 

0.1070 

0.1054 

(StDev) 

±0.0265 

±0.0277 

±0.0256 

±0.0241 

±0.0230 
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for  are  far  above  the  true  value  and  growing  as  the 
number  of  unknown  events  increases.  This  follows  from 
the  fact  that  letting  =  1  means  we  are  effectively 
adding  events  to  the  network.  Take  the  case  when  I<  =  2. 
Assume  that  each  process  has  1000  events,  and  there  are 
100  unknown  events  from  each  process.  When  we  esti¬ 
mate  the  parameters  for  the  first  process,  we  will  use  the 
900  events  we  know  plus  the  200  unknown  events  from 
the  network.  We  will  get  the  identical  number  of  events 
in  our  estimation  for  process  two.  This  creates  200  new 
events  and  thus  biases  the  estimates  for  /x^.  This  moti¬ 
vated  the  idea  of  equal  weighting  for  each  unknown  event, 
and  that  choice  is  validated  by  the  estimates  for  /x^.  A  sim¬ 
ilar  argument  shows  why  =  0  (i.e.  ignoring  all  the 
unknown  events)  has  the  lowest  estimate  for  /x^  at  each 
level  of  incomplete  data. 

In  the  estimates  for  the  branching  ratio  aj<}  the  ESA  on 
average  yields  the  best  estimates  and  maintains  its  accu¬ 
racy  in  the  presence  of  unknown  events.  It  is  interesting 
to  note  that  equal  weighting  performs  worse  here  than  if 
we  let  Stf  =  0  for  all  unknown  events.  Using  the  ESA 
overcomes  this  drawback.  Again,  setting  =  1  for 
all  unknown  events  performs  the  worst.  This  could  stem 
from  the  fact  that  most  of  the  unknown  events  are  being 
labeled  background  and  thus  this  estimation  technique 
underestimates  the  branching  ratio  because  fewer  events 
are  considered  offspring.  Notice  that  the  estimate  for  ESA 
(dash-circle)  tracks  the  best  possible  estimate  (solid)  well 
while  the  other  three  start  to  trail  off  as  more  and  more 
information  is  labeled  as  unknown. 

Finally,  in  Figure  5  and  Table  1,  it  is  shown  that  the 
ESA  estimate  (dash-circle)  for  co^  tracks  the  behavior  of 
the  best  estimate  (solid)  closer  than  the  other  methods. 
Including  all  of  the  unknown  events  (dash-square)  pro¬ 
vides  the  poorest  estimate  for  cofr.  For  the  other  three 
estimation  techniques  we  see  that  they  are  all  comparable. 

Updating  Weights  analysis 

To  understand  the  strengths  and  weaknesses  of  each  of 
the  five  score  functions,  defined  in  Subsection  “Updat¬ 
ing  weights”,  the  score  functions  were  evaluated  for  100 
incomplete  events  using  the  true  values  for  /x^,  c^,  and  ouk 
when  taking  the  Top  1,  Top  2,  and  Top  3  best  inferences. 
For  comparison  to  [24],  the  true  parameters  were  taken 
to  be  /x^  =  0.01,  ook  =  0.1,  and  a k  =  0.5.  Due  to  the 
stochastic  nature  of  the  processes,  for  each  level  of  process 
number  100  random  networks  were  tested.  The  average 
results  of  this  analysis  are  found  in  Figure  6.  The  number 
correctly  identified  by  the  each  of  the  score  functions  is  on 
the  vertical  axis.  The  horizontal  axis  displays  the  number 
of  processes  in  the  network. 

From  Figure  6  it  is  clear  that  the  Stomakhin-Short- 
Bertozzi  score  function  in  solid  dark  blue,  and  the 


Number  of  Processes 
Top  2 


Top  3 


Figure  6  Display  of  the  number  of  correctly  identified  unknown 
events  out  of  100.  Display  of  the  number  of  correctly  identified 
unknown  events  when  the  Top  1 ,  Top  2,  and  Top  3  inferences  are 
taken  into  consideration.  For  all  score  functions,  the  parameters  are 
fik  =  0.01  ,(ok  =  0.1 ,  and  ak  =  0.5,  and  assumed  to  be  known.  The 
Stomakhin-Short-Bertozzi  score  function  (solid  dark  blue  x)  and  the 
Forward  Backward  score  (cyan  dashed  diamond),  the  Probability 
(black  dashed  asterisk)  and  Ratio  (solid  green  square)  score  functions, 
and  the  Lambda  (magenta  dashed  circle)  score  function  and  chance 
(solid  dark  green  plus)  produce  comparable  results  with  these 
parameters. 


Forward  Backward  score  function  (cyan  dashed  diamond) 
perform  nearly  identically  when  looking  at  the  Top  1,  Top 
2,  and  Top  3  inferences.  These  functions  look  both  for¬ 
ward  and  backward  in  time  from  the  incomplete  event, 
and  are  therefore  able  to  identify  clusters  of  events  in  time. 
The  Probability  (black  dashed  asterisk)  and  Ratio  (solid 
green  square)  score  functions  dont  do  nearly  as  well  the 
Stomakhin-Short-Bertozzi  and  Forward  Backward  score 
functions,  but  better  than  the  Lambda  (magenta  dashed 
circle)  score  function.  The  Lambda  score  function  appears 
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to  perform  close  to  chance  (dark  green  solid  plus)  for  the 
Top  1,  Top  2,  and  Top  3  inferred  process  affiliation.  Due 
to  the  success  of  the  Forward  Backward  score  function 
and  the  Stomakhin-Short-Bertozzi  method,  only  these  are 
used  for  further  analysis. 

The  analysis  comparing  the  score  functions  assumed 
that  the  true  parameters  were  known.  However,  when 
applying  this  method  in  practice  there  will  be  error  in  the 
estimated  parameters.  This  estimation  error  will  propa¬ 
gate  through  to  the  score  functions.  To  understand  how 
deviations  of  the  estimated  parameters  influence  the  score 
functions  pairwise  combinations  of  the  parameters  were 
increased  and  decreased  by  90%  from  the  target  values 
jjik  =  0.01,  cok  =  0.1,  and  otk  =  0.5  in  10%  increments. 
In  particular  the  Forward  Backward  and  SSB  score  func¬ 
tions  are  computed  for  pairwise  combinations  of  /x  in  the 
range  of  [  0.001, 0.019],  co  in  the  range  of  [  0.01, 0.19],  and 
a  in  the  range  of  [0.05,0.95].  Further,  in  these  pairwise 
combinations,  the  third  parameter  is  kept  at  the  target 
value.  Notice  that  a  90%  change  is  larger  than  the  errors 
observed  in  the  parameter  estimates  in  Subsection  “Esti¬ 
mation  analysis”. 


To  examine  the  propagation  of  errors  of  the  param¬ 
eters  to  the  score  functions  one  event  from  a  network 
with  10  processes  is  chosen  to  be  unknown.  The  score 
function  S\ytrUe  with  the  target  parameters,  /x^  =  0.01, 
ook  =  0.1,  and  ak  =  0.5  for  the  true  process  is  calculated. 
Then,  on  the  same  network,  the  parameters  are  offset 
by 

parameter  =  parameter  zb  %change  •  parameter.  (19) 

The  offset  score  function  S\ytrUe  is  calculated  from  these 
offset  parameters.  The  difference  between  S\itrue  —  Si)true 
is  taken  for  each  pairwise  combination  of  parameters. 
Again,  due  to  the  stochastic  nature  of  the  processes,  each 
analysis  was  done  for  100  runs  and  the  average  differ¬ 
ence  in  score  functions  is  recorded.  The  results  of  this 
analysis  are  displayed  in  Figure  7  with  those  of  the  For¬ 
ward  Backward  score  function  (left),  and  those  for  the 
Stomakhin-Short-Bertozzi  score  function  (right).  In  gen¬ 
eral  the  Stomakhin-Short-Bertozzi  score  function  is  more 
sensitive  to  the  changes  than  the  Forward  Backward  score 
functions  for  the  /x^  and  ak  parameters.  Changes  in  the 


FB  Method:  n  vs  a  SSB  Method:  ji  vs  a 


FB  Method:  \x  vs  co  SSB  Method:  \i  vs  © 


FB  Method:  ©  vs  a  SSB  Method:  ©  vs  a 


Figure  7  Error  propagation.  The  average  difference  of  the  Forward  Backward  and  Stomakhin-Short-Bertozzi  score  functions  with  the  parameters 
varied  by  ±90%  of  the  target  values,  /x^  =  0.01 ,  cok  =  0.1 ,  and  oik  —  0.5. 
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Forward  Backward  score  functions  are  minimal  for  most 
changes  of  parameters  except  for  small  values  of  As  (i>k 
decreases  then  the  approximated  Forward  Backward  score 
function  decreases,  causing  a  positive  difference.  As  seen 
in  Subsection  “Estimation  analysis”,  Figure  5,  when  esti¬ 
mating  ook,  there  is  a  tendency  to  over,  not  under  estimate 
the  parameter,  and  so  this  does  not  appear  to  occur  within 
these  parameters.  The  changes  in  the  Stomakhin-Short- 
Bertozzi  score  function  depend  on  all  of  the  pairwise 
changes  of  the  parameters.  As  /x/c  increases  the  computed 
Stomakhin-Short-Bertozzi  decreases.  On  the  other  hand, 
as  ook  or  oik  increase  the  score  function  increases.  This 
analysis  shows  that  though  the  Stomakhin-Short-Bertozzi 
method  and  the  Forward  Backward  score  functions  pre¬ 
form  similarly  when  the  parameters  are  known  exactly, 
under  the  influence  of  estimation  error  the  Stomakhin- 
Short-Bertozzi  score  function  varies  more  than  the  For¬ 
ward  Backward  score  function. 

Runtime  Analysis 

Though  the  Forward  Backward  score  function  and  the  SSB 
method  produce  comparable  results  in  terms  of  accuracy, 
there  is  a  sizable  difference  in  the  time  it  takes  to  update 
the  weights  using  these  methods. 

The  Forward  Backward  score  function  is  designed  to 
be  direct,  meaning  calculates  the  weights  using  available 
information  without  need  for  iteration.  The  Stomakhin- 
Short-Bertozzi  method,  however,  determines  the  weight 
by  solving  a  optimization  problem.  A  closed  form  solution 
for  the  maximized  weights  is  not  known  to  these  authors, 
so  the  weights  are  found  by  numerically  approximating 
the  weights  that  maximize  Equation  14.  In  the  imple¬ 
mentation  of  the  Stomakhin-Short-Bertozzi  we  employ 
a  gradient  ascent  method  which  requires  4-11  iterations 
to  reach  convergence  with  a  tolerance  of  0.001.  The 
direct  methods,  Forward  Backward,  Probability,  Ratio, 
and  Lambda  score  functions,  are  on  the  same  order  of 
operations  as  one  iteration  of  the  gradient  ascent  used 
to  solve  Equation  14.  Specifically,  one  iteration  of  the 
gradient  ascent  method  and  calculating  the  direct  score 
functions  are  0(N  •  K  •  M)  where  N  is  the  number  of 
unknown  events,  I<  is  the  number  of  processes  and  M  is 
the  expected  number  of  events  in  process  /c.  The  expected 
number  of  events  in  process  k  can  be  further  analyzed  via, 

M  =  E[Mk]  =  iik  •  T  •  — (20) 
1  —  oik  K 

The  run  time  of  both  the  Forward  Backward  function 
and  the  Stomakhin-Short-Bertozzi  method  are  empiri¬ 
cally  examined  in  Figure  8.  Both  score  functions  were 
calculated  with  20  networks  for  each  level  of  number 
unknown  events  and  number  of  processes  with  the  known 
parameter  values  of  fik  =  0.01,  cok  =  0.1,  and  ak  =  0.5. 
All  of  the  run  times  are  calculated  in  milliseconds.  It  can 


be  seen  that  the  average  run  time  needed  to  compute 
the  Forward  Backward  function  at  every  level  of  N  and 
K  is  substantially  less  than  that  of  the  Stomakhin-Short- 
Bertozzi  method.  Also,  it  is  clear  from  this  figure  that  the 
time  needed  to  calculate  both  of  these  methods  increases 
as  N  and  I<  increase. 

Convergence  Results 

The  Estimation  &  Score  Algorithm  converges  quickly 
when  either  the  Forward  Backward  score  function  or 
Stomakhin-Short-Bertozzi  method  are  used.  Figure  9  dis¬ 
plays  the  parameter  estimates  for  a  typical  run  of  the 
Estimation  &  Score  Algorithm  for  both  the  Forward  Back¬ 
ward  (left)  and  Stomakhin-Short-Bertozzi  (right).  Both 
score  functions  produce  qualitatively  similar  results,  and 
it  appears  that  the  rate  of  convergence  is  comparable  for 
both  cases.  The  estimated  weights  for  one  unknown  data 
event  for  this  typical  run  versus  the  iteration  for  each 
process  are  plotted  in  Figure  10.  The  weights  plotted  are 
obtained  from  the  Forward  Backward  score  function  (left) 
and  the  Stomakhin-Short-Bertozzi  method  (right).  It  is 
interesting  to  note  that  both  methods  of  weighting  choose 
the  same  process  affiliation  as  the  most  likely.  Further  tests 
were  conducted  with  a  variable  initial  weighting.  These 
runs  showed  similar  behavior  as  initializing  the  Estimate 
&  Score  Algorithm  with  S^k  =  1  /K,  implying  that  the  Esti¬ 
mate  &  Score  Algorithm  is  robust  to  small  perturbations 
of  the  initial  weighting. 
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Figure  9  Parameter  estimates  vs.  the  iteration  number.  Plots  of  the  parameter  estimates  for  a  typical  run  of  the  Estimate  &  Score  Algorithm 
using  the  Forward  Backward  (left)  and  Stomakhin-Short-Bertozzi  (right)  methods.  Both  methods  compute  nearly  identical  estimates  of  the 
parameters  for  each  of  the  ten  processes.  The  choice  for  plotting  event  99  was  random. 
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Figure  10  Estimated  weights  vs.  iteration.  Plots  of  the  weights  for  one  unknown  event  computed  by  a  typical  run  of  the  Estimate  &  Score 
Algorithm  using  the  Forward  Backward  (left)  and  Stomakhin-Short-Bertozzi  method  (right).  The  choice  for  plotting  event  99  was  random. 


Hegemann  etal.  Security  Informatics  201 3,  2:1 
http://www.security-informatics.eom/content/2/1/1 


Page  12  of  13 


Discussion  and  Future  Work 

In  this  paper  we  propose  an  effective  method  for  simulta¬ 
neously  estimating  the  parameters  and  assigning  process 
affiliation  in  case  of  incomplete  field  data  from  self¬ 
exciting  point  processes  on  a  network.  This  problem 
comes  from  the  demand  for  law  enforcement  agencies  to 
identify  gang  affiliation  in  the  case  of  unsolved  crimes 
in  an  area  of  highly  complex  gang  rivalry  activity.  We 
present  a  new  framework  we  name  the  Estimate  &  Score 
Algorithm  for  possible  application  to  field  data.  By  test¬ 
ing  the  method  on  simulated  datasets  we  can  understand 
its  performance  features  and  liabilities.  The  method  is 
an  iterative  procedure  in  which  process  parameters  are 
estimated  alternately  with  the  calculation  of  network  affil¬ 
iation  probabilities.  We  identify  several  useful  score  func¬ 
tions’  for  calculating  the  network  affiliations.  We  also 
compare  the  use  of  unknown  events  in  the  parameter  esti¬ 
mation.  One  upshot  of  our  analysis  is  that  the  inclusion  of 
unknown  events  may  increase  the  accuracy  of  the  param¬ 
eter  estimation.  Several  score  functions  are  considered 
and  the  Forward  Backward  score  function  shows  the  most 
promise  with  comparable  results  to  that  of  the  Stomakhin- 
Short-Bertozzi  method  of  [24]  in  the  parameter  regime 
tested.  The  score  function  calculation  is  a  direct  method 
that  does  not  rely  on  solving  a  variational  problem,  and 
thus  is  more  computationally  efficient  than  [24]. 

For  future  work,  space  often  plays  a  role  in  understand¬ 
ing  criminal  activity  [8,30-33].  Further,  criminal  behav¬ 
ior  has  non-random  structure  and  can  often  be  framed 
in  terms  of  routine  activity  theory  [34,35].  In  the  case 
of  gang  violence,  there  is  a  strong  spatial  component 
[1,10,36].  One  can  extend  the  Estimate  &  Score  Algo¬ 
rithm  to  include  space.  There  is  a  precedence  in  the 
earthquake  literature  of  adding  space  to  self-exciting  point 
processes  [13,15],  however,  in  the  case  of  gang  violence, 
the  spatial  response  may  be  different.  Instead  of  retalia¬ 
tory  events  clustering  around  prior  events,  it  appears  that 
the  data  is  clustered  around  regions  in  space.  A  spatial 
model  similar  to  that  of  [37]  could  be  employed,  where 
the  triggering  density  in  space  is  related  to  their  respec¬ 
tive  gang  set-space,  or  center  of  activity  [38].  Statistically 
when  modeling  spatial  point  processes  one  needs  to  tease 
out  the  difference  between  hot  spots  due  to  risk  hetero¬ 
geneity  versus  event  dependence.  The  data  given  will  be 
one  realization  of  the  underlying  process,  however  using 
techniques  such  as  prototyping  [39],  one  could  potentially 
reformulate  the  data  into  multiple  realization  of  the  same 
process  and  distinguish  between  these  two  phenomena. 

There  are  other  factors  in  the  data  that  can  be  fused 
into  the  model,  though  more  analysis  would  be  required. 
For  example,  in  earthquake  modeling  the  magnitude  of 
the  earthquake  is  often  included.  To  include  such  a  factor 
to  the  intensity  Xj((t\HXjjc)  one  would  need  to  determine 
a  numerical  metric  to  define  the  impact  of  each  event 


type.  This  is  not  a  straightforward  task  and  would  require 
further  investigation.  Extending  this  model  in  this  way 
could  allow  for  the  inclusion  of  events  involving  tagging, 
or  other  low  level  gang  crimes,  which  could  be  a  precur¬ 
sor  to  more  extreme  violent  interactions  between  gangs. 
Including  this  data  is  outside  of  the  scope  of  the  current 
model  but  has  a  strong  potential  to  enrich  the  overall  data 
set  allowing  for  better  analysis. 

It  is  important  to  note  that  there  are  other  methods  to 
approximate  the  underlying  form  of  the  self  exciting  pro¬ 
cess.  For  example  the  authors  in  [28]  consider  the  general 
form  of  the  intensity  function  X^{t\Hx^)  to  be 

Ht\Hr)  =  Ii(t)  +  a  J^g(t  -  tj).  (21) 

t>tj 

Using  a  non-parametric  method,  they  are  able  to 
approximated  the  background  function  /x(£)  and  the 
response  function  g(t)  for  a  broader  class  of  functions.  In 
this  paper,  the  data  was  assumed  to  come  from  a  Hawkes 
process  with  constant  background  rate  and  an  exponen¬ 
tial  response  to  previous  events.  There  are  cases  where  the 
background  rate  is  not  constant  [40].  Further  it  is  conceiv¬ 
able  that  the  response  function  could  be  of  a  form  other 
than  an  exponential  decay.  In  this  circumstances,  the 
model  for  \(t\Hx)  in  Equation  1  would  not  be  appropriate. 

Finally,  this  method  has  a  great  potential  in  the  field 
of  policing.  Once  such  a  model  has  been  calibrated  cor¬ 
rectly,  the  Estimation  &  Score  Algorithm  using  the  quicker 
Forward  Backward  score  function  can  be  used  to  infer 
the  gang  association  in  real  time,  while  the  investigation 
is  on  going.  Given  an  accurate  model  of  the  underlying 
process,  such  a  method  could  identify  rivalries  that  have 
heightened  activity. 
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